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Abstract 

MaxEnt inference algorithm and information theory are relevant for the time evolution of macroscopic 
systems considered as problem of incomplete information. Two different MaxEnt approaches are introduced 
in this work, both applied to prediction of time evolution for closed Hamiltonian systems. The first one is 
based on Liouville equation for the conditional probability distribution, introduced as a strict microscopic 
constraint on time evolution in phase space. The conditional probability distribution is defined for the 
set of microstates associated with the set of phase space paths determined by solutions of Hamilton's 
equations. The MaxEnt inference algorithm with Shannon's concept of the conditional information entropy 
is then applied to prediction, consistently with this strict microscopic constraint on time evolution in phase 
space. The second approach is based on the same concepts, with a difference that Liouville equation for the 
conditional probability distribution is introduced as a macroscopic constraint given by a phase space average. 
We consider the incomplete nature of our information about microscopic dynamics in a rational way that 
is consistent with Jaynes' formulation of predictive statistical mechanics, and the concept of macroscopic 
reproducibility for time dependent processes. Maximization of the conditional information entropy subject 
to this macroscopic constraint leads to a loss of correlation between the initial phase space paths and final 
microstates. Information entropy is the theoretic upper bound on the conditional information entropy, with 
the upper bound attained only in case of the complete loss of correlation. In this alternative approach to 
prediction of macroscopic time evolution, maximization of the conditional information entropy is equivalent 
to the loss of statistical correlation, and leads to corresponding loss of information. In accordance with the 
original idea of Jaynes, irreversibility appears as a consequence of gradual loss of information about possible 
microstates of the system. 
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I. INTRODUCTION 



Maximum-entropy formalism, or alternatively MaxEnt algorithm, was formulated by E. T. 
Jaynes in his influential papers [1, 2] intended for applications in statistical mechanics. In Jaynes' 
approach a full development of the results of equilibrium statistical mechanics and formalism of 
Gibbs [3] was possible as a form of statistical inference based on Shannon's concept of information- 
theory entropy as a measure of information [4]. In the language of Jaynes, it is the correct measure 
of the "amount of uncertainty" in the probability distribution [5]. Maximization of information- 
theory entropy subject to certain constraints is a central concept in Jaynes' approach, and provides 
the least biased probability estimates subject to the available information. It is important that 
Jaynes sees Gibbs' formalism as essential tool for statistical inference in different problems with 
insufficient information. This includes equilibrium statistical mechanics [1] and the formulation of 
a theory of irreversibility [2] , that Jaynes tries to accomplish in his later works [5-9] . 

Predictions and calculations for different irreversible processes usually involve three distinct 
stages [7]: (1) Setting up an "ensemble", i.e., choosing an initial density matrix, or in our case an 
iV-particle distribution function, which is to describe our initial knowledge about the system of 
interest; (2) Solving the dynamical problem; i.e., applying the microscopic equations of motion to 
obtain the time evolution of the system; (3) Extracting the final physical predictions from the time 
developed ensemble. As fully recognized by Jaynes, the stage (1) and the availability of its general 
solution simplifies the complicated stage (2). The problem includes also an equally important 
stage (0) consisting of some kind of measurement or observation defining both the system and 
the problem [10]. In direct mathematical attempts that lead to a theory of irreversibility, the 
Liouville theorem with the conservation of phase space volume inherent to Hamiltonian dynamics, 
is represented as one of the main difficulties. Relation of the Liouville equation and irreversible 
macroscopic behavior is one of the central problems in statistical mechanics. For this reason it is 
reduced to an irreversible equation termed Boltzmann equation, rate equation or master equation. 
Far from creating difficulties, Jaynes considers the Liouville equation and the related constancy in 
time of Gibbs' entropy as precisely the dynamical property needed for solution of this problem, 
considering it to be more of conceptual than mathematical nature [6] . 

Mathematical clarity of this viewpoint has its basis in a limit theorem noted by Shannon [4] , an 
application of the fundamental asymptotic equipartition theorem of information theory. This theo- 
rem relates the Boltzmann's original formula for entropy of a macrostate and the Gibbs expression 
for entropy in the limit of a large number of particles [6, 7, 9]. Mathematical connection with the 
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Boltzmann's interpretation of entropy as the logarithm of the number or ways (or microstates) 
by which a macroscopic state can be realized, S = klogW, introduces then a simple physical 
interpretation to the Gibbs' formalism, and its generalizations in the maximum-entropy formalism. 
Maximization of the information entropy subject to constraints then predicts the macroscopic be- 
havior that can happen in the greatest number of ways compatible with the available information. 
In application to time dependent processes, this is referred to by Jaynes as the maximum caliber 
principle [8, 9]. Jaynes clearly stated that this does not represent a physical theory that explains 
the behavior of different systems by deductive reasoning from the first principles, but a form of 
statistical inference that makes predictions of observable phenomena from incomplete information 
[8]. For this reason predictive statistical mechanics can not claim deductive certainty for its pre- 
dictions. This does not mean that it ignores the laws of microphysics; it certainly uses everything 
known about the structure of microstates and any data on macroscopic quantities, without making 
any extra physical assumptions beyond what is given by available information. It is important to 
note that sharp, definite predictions of macroscopic behavior are possible only when it is charac- 
teristic of each of the overwhelming majority of microstates compatible with data. For the same 
reason, this is just the behavior that is reproduced experimentally under those constraints; this 
is known essentially as the principle of macroscopic uniformity [1, 2], or reproducibility [9]. In 
somewhat different context this property is recognized as the concept of macroscopic determinism, 
whose precise definition involves some sort of thermodynamic limit [11]. 

In Jaynes' view, the dynamical invariance of the Gibbs' entropy gives a simple proof of the 
second law, which is then a special case of a general requirement for any macroscopic process to be 
experimentally reproducible [6]. In the simple demonstration based on the Liouville theorem, this 
makes possible for Jaynes to generalize the second law beyond the restrictions of initial and final 
equilibrium states, by considering it a special case of a general restriction on the direction of any 
reproducible process [6, 12]. The real reason behind the second law, since phase space volume is 
conserved in the dynamical evolution, is a fundamental requirement on any reproducible process 
that the phase space volume W , compatible with the final state, can not be less than the phase 
space volume Wo which describes our ability to reproduce the initial state [6]. The arguments 
used in this demonstration imply also the question how to determine which nonequilibrium or 
equilibrium states can be reached from others, and this is not possible without information about 
dynamics, constants of motion, constraints, etc. The second law predicts only that a change of 
macroscopic state will go in the general direction of greater final entropy [9]. Better predictions 
are possible only by introducing more information. Macrostates of higher entropy can be realized 
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in overwhelmingly more ways, and this is the reason for high reliability of the Gibbs equilibrium 
predictions [9]. In this context, Jaynes also speculated that accidental success in reversal of an 
irreversible process is exponentially improbable [12]. 

Jaynes' interpretation of irreversibility and the second law reflects the point of view of the 
actual experimenter. Zurek [13] has introduced algorithmic randomness as the measure of the 
complexity of the microscopic state. He has prescribed entropy not only to the ensemble but 
also to the microscopic state. This prescription makes the principal distinction between his and 
Jaynes' approach. The basic laws of computation reflected in this interpretation allow Zurek to 
formulate thermodynamics from the point of view of Maxwell demon-type entities that can acquire 
information through measurements and process it in a manner analogous to Turing machines. 
According to Jaynes, the detailed description of microscopic development of the system can not 
be extracted from the data about macroscopic development, and therefore it is not a subject of 
his approach. Increase of entropy is related to gradual decrease of information about possible 
microstates of the system compatible with data. The notion that the second law is a law of 
information dynamics, operating at the level of "information bookkeeping", has been considered 
recently by Duncan and Semura [14, 15]. In this line of thinking, the dynamics of information is 
considered to be coupled, but fundamentally independent of energy dynamics. 

MaxEnt algorithm and its methods represent a way of assigning probability distributions with 
the largest uncertainty and extent compatible with the available information, and for the same 
reasons, least biased with respect to unavailable information. Inferences drawn in this way depend 
only on our state of knowledge [1, 2]. In this work, two different applications of MaxEnt algorithm 
to macroscopic closed systems with Hamiltonian dynamics, and their time evolution, are examined 
in detail along with their consequences. The concepts of phase space paths with the path prob- 
ability distribution and associated conditional probability distribution are defined. The respective 
path information entropy and conditional information entropy are introduced in correspondence 
with definitions in Shannon's information theory [4]. In the first approach, Liouville equation 
for the conditional probability distribution is introduced as a strict microscopic constraint on the 
time evolution in phase space, which is then completely determined by this constraint and initial 
values. Maximization of the conditional information entropy, subject to this constraint, predicts 
the macroscopic behavior that can happen in the greatest number of ways consistent with the 
information about microscopic dynamics. If probabilities are considered in the objective sense 
as a property of the system and not of our state of knowledge, full justification of this approach 
is possible only if our knowledge of the microscopic dynamics is complete. In a similar line of 
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reasoning Grandy [16] has developed a detailed model of time dependent probabilities for macro- 
scopic systems within MaxEnt formalism and applied it to typical processes in nonequilibrium 
thermodynamics and hydrodynamics [17, 18]. In a context of the interplay between macroscopic 
constraints on the system and its microscopic dynamics, it is interesting to note that MaxEnt has 
been also studied as a method of approximately solving partial differential equations governing 
the time evolution of probability distribution functions. For more complete further reference, we 
only mention here that this method, among other examples, has been applied to the Liouville-von 
Neumann equation [19], the family of dynamical systems with divergenceless phase space flows 
including Hamiltonian systems [20], the generalized Liouville equation and continuity equations 
[21]. Universality of this approach has been established for the general class of evolution equa- 
tions that conform to the essential requirements of linearity and preservation of normalization [22]. 
This method has been also considered for classical evolution equations with source terms within a 
framework where normalization is not preserved [23]. 

The described first approach allows us to define concepts that are basis for our second approach. 
The difference is that Liouville equation for the conditional probability distribution is now intro- 
duced as a macroscopic constraint. This constraint on time evolution of the phase space probability 
density functions is now taken only on average, and it is given by the integral over accessible phase 
space. It is similar in this respect to constraints given by the data on macroscopic quantities. In 
Jaynes' predictive statistical mechanics more objectivity is ascribed to experimentally measured 
quantities than to probability distributions. The subjective aspect that becomes important here 
is that probabilities are assigned because of incomplete knowledge, i.e., partial information, and 
therefore represent our state of knowledge about the system. If information about dynamics is not 
sufficient to determine the time evolution, an average is taken over all cases possible on the basis 
of partial information. It is observed how elements of irreversible macroscopic behavior in closed 
systems with Hamiltonian dynamics are then a consequence of gradual loss of information about 
possible microstates of the system. This idea has been developed by Jaynes in the density-matrix 
formalism [2]. In the approach which is developed here, we show that irreversible macroscopic 
behavior and Jaynes' interpretation based on reproducibility and information loss, has a clear 
mathematical description in the concepts of conditional information entropy, and its relation with 
the information entropy, i.e., in concepts of Shannon's information theory [4]. At the end of the 
work, the subjective and objective aspects of both approaches are indicated, and relations with 
entropy production are discussed. 
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II. HAMILTONIAN DYNAMICS AND PHASE SPACE PATHS 



The dynamical state of a Hamiltonian system with s degrees of freedom is described by the 
coordinates q± , q<i , . . . , q s and the momenta p\ , P2 , ■ ■ ■ , p s ■ At any time t it is represented by a point 
in the 2s-dimensional Euclidean space T here for our purposes called the phase space of the system. 
The notation (q,p) is introduced for the set of 2s coordinates and momenta. The time dependence 
of 2s dynamical variables (q,p) is determined by Hamilton's equations 

dH dH 

Qi = -k— , Pi = -tj— , 1 < i < s, (1) 

dpi dqi 

where H = H(q,p) is the Hamiltonian function of the system. Given the values (qo,Po) at some 
time to, the solution of Hamilton's equations (1) uniquely determines the values of dynamical 
variables (q, p) at any other time t, 

Qi = Qi(t;qo,Po), Pi = Pi(t;qo,Po), l<i<s. (2) 

Any point (q,p) in the phase space T describes a curve called a phase space path, given by the 
uniquely determined solution of (1). At any time t through each point of T passes only one path, 
and this is denoted by the index in (q,p) w , where uj € fl(T). The set is the set of all paths in 
T. The velocity v of the point (q,p) in the phase space T at time t is given by 



v = v 



fdH\ 2 ^ fdH\ 2 

The velocity vector ~v((q,p) UJ ,t) is tangential at the point (q,p)u, € T to the phase space path u 
passing through it at time t. For the systems considered here the Hamiltonian function H(q,p) does 
not depend on time and the velocity field v(q,p, t) of all points in the phase space T is stationary, 
i.e., v(q,p,t) = v(q,p). 

Let Mo be any measurable (in the sense of Lebesgue) set of points in the phase space V. In the 
Hamiltonian motion the set M is transformed into another set M t during an interval of time t. 
Liouville's theorem asserts that the measure of the set Mt for any t coincides with the measure of 
the set Mo [24, pp. 15-16]. This theorem proves that the measure in the phase space T, 

H{M t )= I dq 1 ...dq s dp 1 ...dp s = \ aT, (4) 

JM t JM t 

is invariant under Hamiltonian motion. In the notation used in (4), volume element 
dqi . . . dq s dp\ . . . dp s of the phase space T is denoted by dT. One immediate corollary [24, pp. 
18-19] of Liouville's theorem is that, if Mq is a Lebesgue measurable set of points of the phase 
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space T, of finite measure, and f(q,p) a phase function Lebesgue integrable over F, then 

/ f(q,P)dT= f(q(t;q ,Po),p(t;qo,po))dT . (5) 

J M t J Mo 

Equation (5) is obtained by changing the variables in the integral and introducing new variables 
(qo,Po), related to the variables (q,p) by transformation of the space T into itself in Hamiltonian 
motion, given by (2). If, in particular, the set M is invariant to the Hamiltonian motion, then 
using this corollary, it is easy to show how an integral of a phase function f(q,p) over the invariant 
set M is transformed into an integration over the set Q(M) of all paths in M. This procedure is 
now developed in the rest of this section. It is used in the definition of probability distributions in 
Sect. III. 

At any time t through each point (q,p)u G T passes only one path oj € that also passes 

through the point (qo,Po)u € T given by the inverse of (2). The infinitesimal volume element 
dTo around the point (qo,Po)ui can be written as dTo = dso^dSo^. Here, dso^ is the infinitesimal 
distance along the path oj. The infinitesimal element dSow of the surface Sq(M) intersects the path 
oj perpendicularly at the point (qo,Po)ui- The surface So(M) is perpendicular to all paths in the 
set n(M) of paths in M. 

The invariance of the measure dT to Hamiltonian motion and the fact that the velocity field 
v(q,p) in T is stationary as the Hamiltonian function H(q,p) does not depend on time, lead to the 
following consequence. For any phase space path oj G O(r), the product of the velocity v({q,p) w ) 
and the infinitesimal surface dS u intersecting the path oj perpendicularly at the point (q,p) u , is 
constant under Hamiltonian motion along the entire length of the path oj, i.e., 

v{{q,p)uj)dS ul = const. (6) 

For any two points (qo,Po)u> an d (q a iPa)ui on the same path uj, the following relation is obtained 
from (6): 

v({qo,Po)u;)dS OLU = v({q a ,p a ) w )dS aw . (7) 

The infinitesimal element dS auJ of the surface S a (M) intersects the path u> perpendicularly at the 
point {q a ,Pa)ui- Like the surface Sq(M), surface S a {M) is also perpendicular to all paths in Q(M). 
The infinitesimal elements d5o w and dS aw of the two surfaces Sq(M) and S a {M) are connected by 
the path uj and neighboring paths determined by solutions of Hamilton's equations. The integral 
over surface S a (M) is transformed using (7) into integration over surface So(M), 

[ dS auJ = [ ((?0iPo)u) ^ 

JSa(M) JSo(M) V{{q a ,Pa)uj) 
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Functional dependence between the points (qo,Po)u> an d (q a ,Pa)ui on the path uj is not explicitly- 
written in the integral (8); it is implied that this functional dependence is determined from solutions 
of Hamilton's equations. 

The following notation is introduced by using (2), with the times t and to fixed, in the phase 
function f(q,p): 

f(q{t;qo,Po),p(t;qo,Po)) = g(qo,Po,to). (9) 

Equation (9) is then substituted (with t and to fixed and the indices in (qo,po) replaced by the 
indices (q a ,Pa)) into the integral (5), taken over the set M which is invariant to Hamiltonian 
motion. This leads to following equality: 

/ f(q,p)dT= f g(q a , Pa ,t Q )dr a . (10) 
J M JM 

The integral (10) is then transformed using relation (7) and dT a = ds aU jdS auj : 

[ g{qa,Pa,to)ds auJ dS auJ = { dSouviiqoiPo)^) [ 9 ^ 9 °' ,Pa,t ^ ds^. (11) 

JM JSo(M) Jlo V{q a ,Pa) 

The function 

F((qo,Po)u,to) = v((qo,p )w) I 9 ^ q ^ Pa,t ^ ds au] , (12) 

is defined on the surface S${M) and is called a path function or path distribution. The integral in 
the relation (12) defining a path function F((qo,po) LO ,to) is over the entire length of the path oj 
intersected perpendicularly by the surface Sq(M) at the point (qo,Po)w- Infinitesimal element of 
the phase space path to passing through the point (q a ,Pa)ui is ds auJ , and the time to m the integral 
(12) is fixed. 

If the phase function f(q,p) in (10) is a phase space probability density function, equal to 
zero everywhere outside the invariant set M, it is straightforward to prove that the path function 
F({qo,Po)uj, to) defined by (12) satisfies the nonnegativity and normalization conditions required 
from probability distributions. Nonnegativity and normalization of the function F[(qo,po) U) ,to) 
which then represents a path probability distribution, follow from the nonnegativity and normal- 
ization properties of the related phase space probability density function f(q,p). With the help 
of (10) and (11) and the definition of F((qo,po) u) ,t( i ) in (12), one then obtains the normalization 
property 

f f(q,p)oT= [ F((q ,PoUto)dSo = l. (13) 
JM JSo(M) 
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Nonnegativity is established for all (qo,Po)u £ Sq(M) in a similar way. Integral over any invariant 
and measurable subset of the set M is transformed, in the way described above, into integral over 
a corresponding measurable subset on the surface Sq(M). It is clear also that the measure defined 
on the surface So(M) can be utilized as a measure on the set Q(M) of all phase space paths in 
some invariant set M. The correspondence between points (qo,Po)u & So(M) and paths to € Q(M) 
is one-to-one. 

III. MICROSTATE PROBABILITY AND PATH PROBABILITY 

It is now possible to relate the microstate probability and the path probability in the phase 
space T of the system. Let the function f(q,p, t) be a microstate probability density function on T. 
All points in the phase space T move according to Hamilton's equations (1) and f(q,p,t) satisfies 
the Liouville equation 



Since df/dt is a total or hydrodynamic derivative, (14) expresses that the time rate of change of 
f(q,p,t) is zero along any phase space path given by uniquely determined solution of Hamilton's 
equations. In the notation used here, this fact is written as 



where points on the path uj £ f&(r) are related by (2). 

In addition to the definition of the path probability distribution F((qo,po) u , to) via mi- 
crostate probability density function f(q,p,t), it is possible to give another equivalent definition 
of F((qo,po) w ,to). In order to accomplish this, probability density function T(q,p, t; qo,Po, to) is 
introduced on the 4s-dimensional Euclidean space T x T. This function has the following special 
properties. If the integral of F(q,p,t;qo,po,to) is taken over the phase space V with (q,p) as the 
integration variables, it gives the microstate probability density function f(qo,Po,to) at time to, 




(14) 



f({q,p)w,t) = /((<?o,poWo), 



(15) 




(16) 



Microstate probability density function f(q,p,t) at time t is obtained analogously, 




(17) 
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It is straightforward to prove, using relation (15), that (16) and (17) are satisfied if the function 
F(q-,P-, t; qo,Po,to) has the following form: 

s 

F(q,p,t;qo,Po,t ) = f{q,p,t)X\5{qi - qi(t;qo,po))5{Pi -Pi(t;q ,Po)), (18) 

i=i 

where qi(t;qo,po) and Pi(t;qo,po) are given by (2) and S-s are Dirac delta functions. In the space 
T x T function J 7 ^, p, i; go,.Po, *o) given by (18) represents the probability density that the point 
corresponding to the state of the system is in the element cITq around the point (qo,Po) at time to 
and in the element dT around the point (q,p) at time t. 

As explained in (17), microstate probability density function f(q,p,t) is given by the integral 
of the function T(q,p, t; qo,Po,to) over T with (qo,po) as integration variables. Now, we assume 
that the set M of all points in T which represent possible microstates of the system is invariant to 
Hamiltonian motion. By applying the similar procedure and notation that was already introduced 
in relations (10) and (11), the integral (17) can now be written as 

f{q,P,t) = / a<SouM(<?o,PoW / t x ds au) . (19) 

Jso(M) L v(q a ,p a ) 

Along with the lines leading to (19), the function G(q,p,t; (qo,Po)u>,to) is also introduced: 

G(q,p,t;(q ,p ) u ,t ) = v{(q ,po)u) [ - qa,Pa ' — ds^. (20) 

The integral in the definition of G(q,p,t; (qo,Po)u>jto) i R (20) is over the entire length of the phase 
space path oo intersected perpendicularly by the surface Sq(M) at the point (qo,Po)u- Using (20), 
relation (19) is then written as 

f(q,P,t)= G(q,p,t;(q ,Po)u,to)dS . (21) 

JS (M) 

It is clear that the expression 

G(q,p,t; (q ,Po) w ,t )dS dT = P(q,p,tD (q ,Po)w,t ), (22) 

represents the probability that the point corresponding to the state of the system is at time to 
anywhere along the paths which pass through an infinitesimal element dSo around (qo,Po) on the 
surface Sq{M), and that at some different time t it is in the volume element dT around (q,p). 

Another definition of the path probability distribution ^(((70,^0)^,^0)5 in addition to (12), is 
now possible in this way. It is given by the integral 

F((q ,Poh,t ) = ^G( 9 ,p,t;(g ,PoWo)dr. (23) 
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Then, in accordance with the theory of probability, the ratio 

G(q,p,t; (q , p ) U) ,to)dS dT 

m(n n , . ,, q = P(?,P,t|(go,PoWo), ( 24 ) 

represents the conditional probability that at time t the point corresponding to the state of the 
system is in the element dT around (q,p), if at time to it is anywhere along the paths passing 
through the infinitesimal element dSo around (qo,Po) on the surface Sq(M). Relation (23) then 
proves that the integral of (24) over T satisfies the normalization condition, i.e., 

r F((q ,p ) UJ ,t )dS • 1 ' 

To set up all the tools of probability theory needed in this work, conditional probability distribution 
D(q,p,t\(qo,po)u),to) that corresponds to conditional probability (24), is defined by the relation 

D(q,p,t\{qo,Poh,to) = ^77 7 — rr • (26) 

The relation (24), like the relation (22), represents probability which is conserved in the phase 
space r. The total time derivative (i.e., time rate of change along the Hamiltonian flow lines) 
of this probability is equal to zero. In the relation (24) for the conditional probability, the path 
probability distribution ^((gcPo)^ *o) and the surface element dSo are independent of the variables 
t and (q,p). Also, measure dT is invariant to Hamiltonian motion. Therefore, it follows that the 
total time derivative of the conditional probability (24) is equal to zero if and only if 

= 92. + V (— — - — — \ ~ (27) 
dt ~ dt ^ \dqi dpi dpi dqi J 

This is a straightforward demonstration that the probability distribution G(q,p, t; (qo,Po)u>, to) 
satisfies the equation analogous to the Liouville equation (14) for the microstate probability dis- 
tribution f(q,p, t). 



IV. INFORMATION ENTROPIES AND MAXENT ALGORITHM 

The quantity of the form H = — ^ pi \ogpi has a central role in information theory as a measure 
of information, choice and uncertainty for different probability distributions pi. In an analogous 
manner Shannon [4] has defined entropy of a continuous distribution and entropy of iV-dimensional 
continuous distribution. As pointed out by Jaynes [5], the analog of — Y^=iPi^°&Pi f° r a discrete 
probability distribution pi which goes over in a limit of infinite number of points into a continuous 
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J w(x) ' 



Sj = — w(x) log 



distribution w(x) (in such a way that the density of points divided by the total number of points 
approaches a definite function m(x)) is given by 

w(x) 1 , . . 

dx. (28 

m(x) 

Shannon assumed the analog — J w{x) \og[w{x)\dx, but he also pointed out an important difference 
between his definitions of discrete and continuous entropies. If we change coordinates, the entropy 
of a continuous distribution will in general change in the way taken into account by Shannon [4]. 
To achieve the required invariance of entropy of a continuous distribution under a change of the 
independent variable, it is necessary to introduce the described modification that follows from 
mathematical deduction [5]. This is achieved with an introduction of the measure function m(x) 
and yields the invariant information measure (28). If a uniform measure m = const is assumed 
[5], the invariant information measure (28) differs from the Shannon's definition of entropy of a 
continuous distribution [4] by an irrelevant additive constant. 

Shannon [4] has also defined the joint and conditional entropies of a joint distribution of two 
continuous variables (which may themselves be multidimensional), concepts that are applied in 
this work. In the previous section, joint distribution G(q,p,t; (qo,Po)ujito) of two continuous mul- 
tidimensional variables (q,p) £ T and (qo,Po)u> G Sq{M) was introduced. Following the detailed 
explanation of (22), G(q,p,t; (qo,po)u>,to)dSodF represents the probability of the joint occurrence 
of two events: the first occurring at time to among the set of all possible phase space paths f2(M) 
and the second occurring at time t among the set of all possible phase space points M which is 
invariant to Hamiltonian motion. In accordance with Shannon's definition [4], joint information 
entropy of the joint distribution G(q,p,t; (qo,Po)u),to) is given by 

Sf(t,t ) = - f [GlogGdTdS . (29) 
Js (M) Jr 

The notation Sj(t,to) indicates that it is a function of times t and to, through the distribution 
G = G(q,p,t; (qo,Po)ui,to)- Following Shannon's definition [4], conditional information entropy of 
the joint distribution G(q,p,t; (qoiPo)uj,to) is then given by 



S? F (t,to) = -j [dog 
JSq(m) Jr 



dTdSo, (30) 

'So(M).' 

where F = F((qo,po)u,to) is the path probability distribution. Using the definition of 
D(q,p,t\(qo,Po)u>,to) m (26), one immediately obtains the equivalent form of the conditional infor- 
mation entropy (30): 

5f F (t,t ) = - / [ DFlogD dTdSo- (31) 
JSo(M) Jr 
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From (31) it is clear that the conditional information entropy Sj(t,to) is the average of the 
entropy of conditional probability D(q,p,t\(qo,Po)ui,to), weighted over all possible phase space 
paths co € fi(M) according to the path probability distribution F((qo,po)w, to). 

Relation between the information entropies Sf(t,to) and Sf F (t,to), introduced in (29) and 
(30), is completed by introducing the information entropy of the distribution F((qo,po)u>,to), or 
alternatively, path information entropy: 

Sf{to) = - f FlogFdSo. (32) 

JSo(M) 

Relation between Sf(t,to), Sf F (t,to) and Sf(to) is obtained straightforwardly, using (26) in (29), 
and then applying the properties of probability distributions. In this way one obtains 

Sf(t, t ) = Sf F (t, t ) + Sf (to). (33) 

Relation (33), in accordance with the analogous relation of Shannon [4], asserts that the uncertainty 
(or entropy) of the joint event is equal to the uncertainty of the first plus the uncertainty of the 
second event when the first is known. 

It is important to give some additional comments to (33). In general, uncertainty of the joint 
event is less then or equal to the sum of uncertainties of the two individual events, with the equality 
if (and only if) the two events are independent [4]. The probability distribution of the joint event 
is given here by G(q,p,t; (qo,Po)u>,to)- Information entropy or uncertainty of one of them (in this 
case called the second event because of its occurrence at a later time) is equal 

S{{t) = - jjlogf cT. (34) 

The quantity S{(t) is the information entropy of the microstate probability distribution f(q,p,t), 
or in short, information entropy. The uncertainty of the first event is given by the path information 
entropy Sf(to) defined in (32). The aforementioned property of information entropies is given here 
for Sf(t,to), Sj(t) and S[(to) by the following relation: 

S?(t,t )<S{(t) + Sf(to), (35) 

with the equality if (and only if) the two events are independent. Furthermore, from (33) and (35), 
one obtains an important relation between Sj(t) and S® F (t, to): 

S{(t)>S? F (t,t ), (36) 
with the equality if (and only if) the two events are independent. 
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In terms of probability, the events occurring at time to among the set of all possible phase space 
paths Q(M) and at any time t among the set of all possible phase space points M C T, are not 
independent. If we assume that the values of joint probability distribution G(q,p,t; (qo,Po)oj,to) 
are physically well defined (in the sense of (22)) for all points (q,p) € T and (qo,Po) € Sq(M) at 
given initial time t = to, its values are then determined at all times t in the entire phase space 
r via the Liouville equation (27). Simple deduction leads to the conclusion that maximization 
of the conditional information entropy Sf F (t,to), subject to the constraints of Liouville equation 
(27) and normalization, can not yield the upper bound which is given (at any time t) by the value 
of the information entropy Sj(t) in (36). Attaining this upper bound would require statistical 
independence or, in other words, a complete loss of correlation between the set of possible phase 
space paths Q(M) at time to and the set of possible phase space points M C T at time t. This 
is precluded at any time t by the constraint implied by the Liouville equation (27), and the 
requirement that the joint probability distribution G(q,p,t; (qo,Po)u>,to) is we h defined. 

At this point it is helpful to make a distinction between two aspects of time evolution. The first 
is a microscopic aspect which represents a problem of dynamics implied in this work by Hamilton's 
equations. The solutions are represented in T as phase space paths. Predicting macroscopic time 
evolution represents a problem of available information and inferences from that partial informa- 
tion. Therefore, microscopic dynamics and the respective phase space paths are also part of this 
problem of incomplete information. In a case of macroscopic system, information about micro- 
scopic dynamics is very likely to be incomplete for variety of different possible reasons. However, 
in the absence of more complete knowledge, Hamilton's equations (1) and the set of possible phase 
space paths are the representation of our information about microscopic dynamics. It is natural 
to assume that the predicted macroscopic time evolution for a closed system is consistent with our 
knowledge of microscopic dynamics, even when this knowledge is not complete. 

All arguments mentioned before lead to the conclusion that regarding Liouville equation (27) 
as a strict microscopic constraint on time evolution is equivalent to having complete information 
about microscopic dynamics. Following previously introduced assumptions, the Liouville equation 
(27) can also be regarded as a macroscopic constraint on time evolution. If our information about 
microscopic dynamics is not sufficient to determine the time evolution, an average is taken over 
all cases possible on the basis of our partial information. The conditional information entropy 
Sf F (t,to) is then maximized subject to the constraint of Liouville equation (27), introduced as 
a phase space average, or more precisely, an integral over phase space similarly to other macro- 
scopic constraints. In predictive statistical mechanics formulated by Jaynes, inferences are drawn 
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from probability distributions whose sample spaces represent what is known about the structure 
of microstates, and maximize information entropy subject to the available macroscopic data [8]. 
In this way "objectivity" of probability assignments and predictions is ensured from introducing 
extraneous assumptions not warranted by data. In this work we introduce the same basic idea into 
stage (2) of the problem of prediction for closed Hamiltonian systems. This approach allows us to 
consider the incomplete nature of our information about microscopic dynamics in a rational way, 
and leads to the loss of correlation between the initial phase space paths and final microstates and 
to uncertainty in prediction. The conditional information entropy Sf F (i,to) is the measure of this 
uncertainty, related to loss of information about the state of the system. 



V. MAXENT INFERENCES AND TIME EVOLUTION 



In the first approach, time evolution of the conditional probability distribution 
D(q,p,t\(qo,po) w ,to) in the interval to < t < t a should satisfy the following constraints: nor- 
malization condition 

/ D(q,p,t\(q ,p ) LO ,t )dT = l, (37) 
JM 

and the Liouville equation for D(q,p,t\(qo,po) U! ,to), 

— + (— — - — — \ - (38) 
dt f-f V % dpi dpi dqi J 

From (26) it follows that the constraints given by (27) and (38) are equivalent. By definition, 
the set of all possible microstates M C T is an invariant set. The normalization constraint (37) 
contains information about the structure of possible microstates in T, in the time interval under 
consideration to < t < t a . Information about microscopic dynamics is represented by the set Q(M) 
of possible phase space paths in T. In addition, this information is contained in the Liouville 
equation (38). The assigned path probability distribution F((qo,po)ui, to) is compatible with the 
available information. 

Time derivative of the conditional information entropy Sf F (t,to) in (31) is given by 

iSf^M^t I 3D FlogDdrdSo _ I fOD FdrdSo 
dt JSo(M) Jm dt JS (M) Jm at 

Because of the normalization, (37), the last term in (39) is equal to zero. At time t a , conditional 
information entropy Sf F {t a , to) is given by the expression, 



S? F (t a ,t Q ) = - f a [ [ F log D drdSodt + S? F (t , t ). (40) 

Jt JS (M) JM at 
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The following functional is then formed 

't a 



J[D] = Sf F {t a , t ) - S? F (t , t ) = f ° [ [ K(D, d t D)drdS dt, (41) 

J t JSo(M) JM 

with the function K(D, dtD) given by 

dD 

K(D,d t D) = -—F\ogD. (42) 

In the variational problem which is considered here, functional J[D] in (41) is rendered stationary 
with respect to variations subject to the constraints (37) and (38). On the boundary of integration 
region M x (to,t a ) in the integral (41), function D(q,p, t\(qo,po)u, to) is not required to take on 
prescribed values. The constraints given by (37) and (38) are written here in equivalent but more 
suitable form: 

M(Qo,Po)u,t ;t,D) = F [ D dT - F = 0, (43) 

JM 

and 



<£>2((<?o,Po)u;, t ; q,p, t, d q D, d p D, d t D) 



dD y^fdDdH dDdH 
dt ~{ V % dpi dpi dqi 



0. (44) 



Methods for variational problems with this type of constraints exist and one can develop them 
and apply in practical problems [25]. Here, in the notation which is adapted to this particular 
problem, the following functionals are introduced: 

Ci[£>,Ai] = / ^ A m dtdS , (45) 

JSo(M) Jt Q 



C 2 [D,X 2 }= f \ a f \ 2 <p 2 dTdtdSo. (46) 

JSn(M) J to JM 



and 

'•ta 

ISo(M) J t JM 

The Lagrange multipliers Ai = \i((qo,Po)u),to;t) and A2 = ^2((qojPo)ui, to] q,p, t) are functions 
defined in the integration regions in (45) and (46). For any function with continuous first partial 
derivatives, Euler equation for the constraint (p 2 = (p 2 ((qo,Po)u>,to;q,p,t,d q D,d p D,dtD) is equal 
to zero. Following the multiplier rule for such problems explained in ref. [25], we introduce an 
additional (constant) Lagrange multiplier Ao as a multiplicative factor for K, 

J[D, A ] = f a ! f X K(D, d t D) dTdS dt. (47) 

J to JSo(M) JM 

The functional I[D, Ao, Ai, A2] is formed from «/[£>, Ao], C\[D,\i] and C 2 [D,X 2 ]: 

I[D,Xo,X 1 ,\ 2 ] = J[D,\ }-C 1 [D,X 1 }-C 2 [D,X 2 }. (48) 
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Existence of Lagrange multipliers Ao / 0, Ai and A2, such that the variation of I[D, Ao, Ai, A2] is 
stationary 51 = 0, represents a proof that it is possible to make J[D] in (41) stationary subject 
to constraints (43) and (44). The function D (q,p, t \(qo,Po)w > *o) which renders J[D] stationary 
subject to (43) and (44) must satisfy the Euler equation: 




d f dK 



s-^ d dK \ 

^ [dq~ \d(d^D)) + dpi \8{8 Pi D))\ 



<)\ 2 ( ,rx ■'" ox 2 on 



dqi dpi dpt dqi 



0. 



(49) 



It is easy to check that the term multiplied by Ao in Euler equation (49) is equal to zero. Stationarity 
of the functional I[D, Ao, Ai, A2] in (48) is therefore possible even with Ao / 0. From (49) it follows 
that the Lagrange multipliers Ai and A2 satisfy the equation 

'd\ 2 dH dX 2 dH s 



dt ^ 

i=i 



Ai. 



(50) 



dqi dpi dpi dqi 

In this variational problem, the function D(q,p,t\(qo,po) LO ,to) that renders J[D] in (41) sta- 
tionary subject to constraints (43) and (44), is not required to take on prescribed values on the 
boundary of integration region M x (to,t a ). Therefore, it is necessary, that in addition to satis- 
fying the Euler equation (49), it also satisfies the Euler boundary condition on the boundary of 
M x (to,t a ), ref. [25]. For all points on the portion of the boundary of M x (to,t a ) where time 
t = to or t = t a , the Euler boundary condition gives independently: 

dK 



d{d t D) 



-X2F 



t=to,t a 



= -[logD + A 2 ] t=t0ito F = 0. 



(51) 



For all points on the portion of the boundary of M x (to,t a ) where time t is in the interval 
to < t < t a , the Euler boundary condition gives: 



F [A 2 v • n] at the boundary of M — 0. 



(52) 



In (52), v • n is a scalar product of the velocity field v(q,p) in V (defined in Sect. II) and the 
unit normal n of the boundary surface of invariant set M, taken at the surface. Equation (52) 
is satisfied naturally due to Hamiltonian motion, since the set M is invariant by definition, and 
therefore v • n = for all points on the boundary surface of M. This is a consequence of the fact 
that phase space paths do not cross over the boundary surface of the invariant set M. 

Functions D(q,p, t\(qo,po)u>, to) that render J[D] in (41) stationary subject to the constraints 
(43) and (44) are determined from the constraints and the boundary condition given by (51). From 



17 



(51) one obtains the form of D(q,p,t\(qo,Po)ui,to) at times to and t a , 

D(q,P,t\(<to>Po)uMt=to,t a = ex P[- A 2((9o,Po) w ,io;g,P,i)]| t=toA • (53) 

Since it is only required that t a >to, time t a is arbitrary in other respects. The boundary condition 
(51) then holds for any time t > to- 

D(q,p,t\(q ,po) u ,t ) = exp [-X 2 {(q ,Po)oj, *o; q,P, t)] ■ (54) 

From the constraint (44), using (54), one immediately obtains an equation for the Lagrange mul- 
tiplier \2{(qo,Po)u>,t ;q,p,t): 

dh_ + sp fd)^dH_ _ dX 2 dH 
dt ~[ V % d Pi dp* % 

By comparison of (50) with (55), it follows that for all t>to, 

Ai((<?o,PoWo;i) = 0. (56) 

As explained in Sect. IV, for any physically well defined conditional probability distribution 
D{q,p,t\(qo,po) w ,to) (in the sense of (24) and (26)), the upper bound on Sf F (t,to), given by 
(36), is not attained in maximization. 

The conclusions that follow from the interpretation of (36) and the property of Sf F (t,to) as a 
measure of uncertainty explained in Sect. IV, are considered now in the second approach. This 
is done by replacing the strict equality constraint (44) by the constraint which is of isoperimetric 
form, 

(P2((qo,Po)io,to;t,D) = 

Jm 

The functional (46) is then replaced with the functional 

C 2 [D,X 2 ]= f [ a X 2 V2dtdS . (58) 

JSo(M) J t 

Lagrange multiplier A2 = X 2 ((qo,po) ul ,to; t) is now a function defined in the integration region in the 
integral (58). Information that the set M of possible microstates is invariant to Hamiltonian motion 
is contained in the constraint (57). Analogy is not complete, because a much larger class of functions 
satisfies the constraint (57), including all functions that in addition, satisfy also the constraint 
(44). This fact allows for maximization of the conditional information entropy S^ F (t a ,to) in (40), 
subject to constraints (43) and (57), even if D(q,p,t\(qo,po) u ,to) is prescribed at initial time to. 




3D ^ 

i=i 



3D dH 3D 3H 



dqi dpi dpi dqi 



f dr = 0. 



(57) 
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The prescribed D(q,p,t\(qo,po) u ,to) at initial time to must be physically well defined in the sense 
of (24) and (26). In this variational problem, function D(q,p,t\(qo,po) OJ ,to) is not required to take 
on prescribed values on the remaining portion of the boundary of integration region M x (to,t a ) 
in (40). 

For a function D(q,p,t\(qo,po) CJ ,to) to maximize Sf F (t a ,to) subject to constraints (43) and 
(57), it is necessary that it satisfies the Euler equation: 



(dK d f dK » ^ 
l Jm ~ Tit 1 m I ~ ^ 



d ( dK \ d f dK 
+ 



d\ 

\ lF + -lF = 0. (59) 



- A 2 F 



= -[logD + A 2 ] t=t F = 0. (60) 



t=t, 

t=t a 



\dD dt\d(d t D)J j^Uqi \d(d qi D) ) dpi \d{d n D) ) J J dt 

Another necessary condition for a maximum, in addition to (59), exists if function 
£K<Z>P>*l(<7o>Po)o;> to) is not required to take on prescribed values on a portion of the boundary 
of M x (to,t a ): then, it is necessary that D(q,p, t\(qo,po)w, to) satisfies the Euler boundary condi- 
tion on the portion of the boundary of M x (to,t a ) where its values are not prescribed, ref. [25]. 
In accordance with this, for all points on the portion of the boundary of M x (to, t a ) where t = t a , 
the Euler boundary condition gives: 

" dK 

d(d t D) 

The Euler boundary condition is satisfied naturally for all points on the portion of the boundary 
of M x (to,t a ) where time t is in the interval to < t < t a . The set M is invariant to Hamiltonian 
motion, and equation analogous to (52) is also satisfied here naturally due to Hamiltonian motion. 

In analogous manner leading to (50) in the first approach, the Euler equation (59) now leads to 
the equation for the Lagrange multipliers ^i((qo,Po)ui, to; t) and \2((Qo,Po)ui,to;t): 

The form of the MaxEnt conditional probability distribution at time t a follows from (60): 

^(9,P,*a|(9o,Po)a;,*o) = exp [-\ 2 ({qo,Po)oj, to] ta)] • (62) 

For a well defined conditional probability distribution at initial time to, there is an entire class of 
equally probable solutions D(q,p,t\(qo,po) UJ ,tQ) obtained by MaxEnt algorithm, which all satisfy 
the macroscopic constraint (57). At time t a , all functions in this class of MaxEnt solutions are equal 
and given by (62). With the exception of times to and t a , the conditional probability distribution 
D(q,p, t\(qo,po)u,, to) obtained by MaxEnt algorithm is not uniquely determined in the interval 
to < t < t a . This is a consequence of the fact that the macroscopic constraint (57) does not deter- 
mine the time evolution of D(q,p, t\(qo,po)ui, to) uniquely, in the way that the strict microscopic 
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constraint (44) does. However, MaxEnt solutions still predict only time evolutions entirely within 
the invariant set M, due to (52). This property follows from the constraint (57), and takes into 
account the information about the constants of motion that determine the invariant set M, and in 
this way, about related conservation laws. 

From the normalization (37) of the conditional probability distribution, given at time t a by 
(62), one obtains the relation: 

W(M)exp[-X 2 ((q , Po ) u] ,t ;t a )} = 1, (63) 

where W(M) is the measure, i.e., phase space volume of the invariant set M. Equation (63) 
implies that the Lagrange multiplier \2((qo,Po)u>, to] t) at time t = t a is independent of the variables 
(<7o,Po)u,: 

^2({qo,Po)u,,t ]t a ) = A 2 (t a )- (64) 

Microstate probability distribution f(q,p,t) at time t = t a is then calculated by using: (21) and 
(26), the MaxEnt conditional probability distribution D(q,p, t\(qo,po) u , to) at time t = t a given by 
(62) and (64), and the path probability distribution F((qo,po)ui, to) at initial time to- 

f(q,P,t a ) = exp[-A 2 (t )] ■ (65) 

It follows from (62-65) that at time t a , the MaxEnt conditional probability distribution and the 
corresponding microstate probability distribution are equal, 

D(q,p,t a \(q ,po)w,to) = f(q,P,t a ) = exp [-A 2 (i a )] = W ( M y ( 66 ) 

From (31), (34) and (66), one obtains the values of information entropies Sp F (t,to) and S{(t) at 
time t a , 

S{(t a ) = S? F (t a ) = log W(M). (67) 

Equations (66) and (67) are possible only in case of statistical independence, i.e., the complete loss 
of correlation between the phase space paths at time to, and the microstates at time t a . In general, 
property of macroscopic systems is that they appear to randomize themselves between observa- 
tions, provided that the observations follow each other by a time interval longer then a certain 
characteristic time r called the relaxation time [26]. In the interpretation given here, relaxation 
time r for a closed Hamiltonian system represents a characteristic time required for the described 
loss of correlation between the initial phase space paths and final microstates. Furthermore, r also 
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represents a time interval during which predictions, based on incomplete information about micro- 
scopic dynamics, become uncertain to a maximum extent compatible with data. This uncertainty 
is related to loss of information about the state of the system. 

This interpretation is reflected in the role of the Lagrange multipliers ^i((qoiPo)ui, to; t) and 
^2((qo,Po)ui, to; t). They are required to satisfy (61), and by integrating it one obtains the following 
relation, 

A2((go,Po)^,*o;i) = / >^i((qo,Po)uj,to;t')dt' + \ 2 ((qo,Po)u,to;to), (68) 

J to 

for all t in the interval to <t <t a . By using (68), with (63), (64) and (67), one obtains 

S{(t a ) = S? F (t a , t ) = log W(M) = f ° Ai((go,Po) w , *o; t)dt + X 2 ((qo,Po)u J , t ;t ). (69) 

J t 

It is clear, from relations (64), (68) and (69), that at time t a the Lagrange multiplier 
^2((<7o>Po)td) *o! t a ) = M{t a ) is determined by the measure W(M) of the invariant set M of possible 
microstates, i.e., the volume of accessible phase space. The subsequent application of MaxEnt 
algorithm of the described type for a closed system with Hamiltonian dynamics, without the in- 
troduction of additional constraints, results in the increase of W(M). From (64), (68) and (69) it 
is then deduced that \2it a ) > A2(£o)- 

Information about the structure of possible microstates restricts the corresponding set, and 
therefore sets an upper bound on the volume of accessible phase space. The values of Sf F {t a ,to) 
and Sj(t a ) at time t a , given in (69), are equal to the maximum value of the Boltzmann- Gibbs 
entropy, compatible with this information. The Lagrange multiplier \i((qo,Po)u, to; t), integrated 
in (69) over time to < t < t a , is then determined by the rate at which the maximum Boltzmann- 
Gibbs entropy is attained in reproducible time evolution. The integral in (69), and the quantity 
Ai((gO)Po)u>) to; t), can be identified with the change in entropy, and the entropy production for 
a closed Hamiltonian system, respectively. If information about microscopic dynamics of closed 
Hamiltonian system is considered complete, whether entropy production can be defined without 
recourse to coarse graining procedures, or macroscopic, phenomenological approaches, remains an 
open question. In general, information is discarded in all such models, at some stage, in order to 
match with what is observed in nature. 

VI. CONCLUSION AND RELATED ISSUES 

If we consider the possibility that our information about microscopic dynamics is incomplete, 
reproducibility and information loss become a part of description of the macroscopic time evolution. 
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This approach then leads to a simple definition for entropy production. The idea that irreversibility 
is related to a gradual loss of information has been developed by Jaynes in the density-matrix 
formalism [2]. Recently, Duncan and Semura [14, 15] suggested the notion that information is really 
lost at a fundamental level. The interplay of quantum decoherence and dynamics is considered 
as one of possible reasons behind the second law of thermodynamics, with entropy production 
caused by information leaking into the environment [27, 28]. In our classical approach, information 
loss is related to incomplete information about microscopic dynamics. If one considers this idea 
carefully, even in this simple model, incompleteness of information must be taken into account in 
some unbiased way. 

The issues related to incomplete information can be discussed in an objective manner. Analyt- 
ical principles for such purposes, i.e. for a separation of the subjective and objective aspects of the 
theoretical formalism, are found in probability theory. Philosophy of this approach is based on the 
interpretation of probability theory as a natural extension of deductive logic. Such generalization 
has been developed in axiomatic way by Jaynes in his treatise on probability theory [29]. It was 
intended as a tool for plausible inference in situations of incomplete information. The standard 
axiomatic probability theory is derived from this generalized theory, suggesting in itself that the 
generalized theory is a proper tool for incorporating new information in our probability distribu- 
tions. Probability distributions are interpreted in that sense as carriers of incomplete information. 
This approach is perhaps best understood from descriptions given by Jaynes [29]: ". . . probability 
theory as a generalized logic of plausible inference which should apply, in principle, to any situation 
where we do not have enough information to permit deductive reasoning." We quote also the fol- 
lowing lines from [29], which we think are important for the discussion in the next paragraph: "But 
this is equally true of abstract mathematical systems; when a proposition is undecidable in such a 
system, that means only that its axioms do not provide enough information to decide it. But new 
axioms, external to the original set, might supply the missing information and make the proposition 
decidable after all." We can conclude that when probabilities are interpreted in a related way as 
a property of our state of knowledge, and applied supplemented with MaxEnt algorithm, that the 
mathematical description of irreversible behavior fits naturally within the concepts of Shannon's 
information theory [4]. 

Another objective aspect of the problem mentioned above is related to the issues that were 
raised in very interesting and speculative way by Zwick. In his paper [30] on the measurement 
problem in quantum mechanics, the difficulty of describing it at the level of quantum dynamics 
is compared, and found to be similar with the incompleteness of certain axiomatic systems in 
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mathematics, discovered and elaborated by Godel and others [31, 32]. According to Zwick, the 
extensive parallelism between the physical and mathematical cases suggest the possibility that the 
measurement process is self-referential as was Godel's special formula, and that measurement may 
be undecidable within the dynamics (formalism of the time-dependent Schrodinger equation) , and 
occurring only at a meta-level of the formalism. In such line of thinking, physical theory would 
have then at least two levels; the measurement process would be described at a meta-level, but 
undecidable on the base level which is described by the dynamical law. At the same time, the 
base level is inherently incomplete and no contradiction is generated. The dynamical law and all 
the processes described by it are reversible, but the measurement process is irreversible; in this 
way irreversibility would be present in a description of the measurement process within such two- 
level theory. Zwick quotes similar suggestions by Pattee [33] about the necessity of two levels of 
structure and description for any prediction and control (i.e. measurement) process. Questions that 
are raised about irreversible behavior of systems governed deterministically by the time-symmetric 
equations of motion, would then appear paradoxical only in the context of single-level theory [33]. 

Without involving us more deeply in these issues, we note that in our application of MaxEnt to 
the problem of prediction of time evolution of closed systems with Hamiltonian dynamics, certain 
features of two level theory can be clearly recognized. In this simple model they appear only as 
a result of our recognition of incompleteness of our own information about microscopic dynamics. 
From pragmatic viewpoint this allows us to discuss further on the related issues about the interplay 
between our knowledge and measurement constraints on the system and its "actual" dynamics. 
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